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In application areas like bioinformatics multivariate distributions on angles are encoun- 
tered which show significant clustering. One approach to statistical modelling of such situa- 



tions is to use mixtures of unimodal distributions. In the literature (Mardia et al. 2011 1, the 



multivariate von Mises distribution, also known as the multivariate sine distribution, has 
been suggested for components of such models, but work in the area has been hampered by 
the fact that no good criteria for the von Mises distribution to be unimodal were available. 
In this article we study the question about when a multivariate von Mises distribution is uni- 
modal. We give sufficient criteria for this to be the case and show examples of distributions 
with multiple modes when these criteria are violated. In addition, we propose a method to 
generate samples from the von Mises distribution in the case of high concentration. 
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1 Introduction 

In biochemistry it is well known that structure of macro-molecules such as proteins, DNA, and 
RNA can be described in terms of conformational angles. For proteins, these angles could be the 
dihedral and bond angles describing the conformation of the backbone together with additional 
angles for the configuration of the side chains (see e.g. Branden and Tooze 1998). Data sets 



consist of the angles to describe each monomer in a macro-molecule, the number of angles 
required to give the conformation of a monomer determines the dimensionality of the problem. 
In non-coding RNA there can be 7 to 8 dihedral angles of importance per amino acid (Frellsen 



et al. 2009i ) and, if the side chains angles are included, many angles are required for amino acids 



in proteins {e.g. Harder et al. 2010 ). The resulting distributions on angles are multivariate, often 



highly structured, featuring various modes together with regions excluded by steric constraints 
{e.g. 



Mardia et al. 2011). 



One way to approach the statistical modelling of such multimodal, multivariate distributions 
is to use mixture models with unimodal components. In the Euclidean space W, an obvious 
choice for the components is to use normal distributions with appropriately chosen covariance 
matrices. For angular data, as considered in this article, the choice of component distribution 
in less clear, but a simple analogue of the multivariate normal distribution is the multivariate 
von Mises distribution (Mardia et al. 2008). This distribution is suggested for mixture mod- 



elling in Mardia et al. (2011). In order for a mixture model to be a useful description of a 
multimodal distribution, it is essential that the component distribution is unimodal. In case of 
the multivariate von Mises distribution, this constraint excludes some of the parameter range. 
Previous work has been complicated by the problem that no characterisation of the parameter 
values corresponding to the unimodal case was available. To solve this problem, this article pro- 
vides sufficient criteria for the multivariate von Mises distribution to be unimodal and we show 
examples of distributions with multiple modes (where these criteria are violated) . It should be 



noted that univariate circular distributions are well established (see, for example, Mardia and 
Jupp 2000 ) but understanding of multicircular distributions is still evolving. 
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The multivariate von Mises distribution, first introduced in Mardia et al. (2008) and also 
known as the multivariate sine distribution, is denoted by MVM(/j,, k, A). It is a distribution on 
the torus = [0, 2Tr)P and is given by the density 



V{d; P^, ^, A) - exp{>Jc{0) + ^sie f As{0)) (1) 

for all 9 e T*'. Here we use the abbreviations Ci{9) = cos(^i — /i^) and Si{6) — sin(6'i — ^i) for 
i — 1, . . . ,p and Z(k, A) is the normalisation constant. The parameters of the distribution are 
the "mean" G T^, the "concentration parameter" k G with k,; > for i = 1, . . . ,p and 
A = (Xij) G M.P^'P with A^ = A and A,, = for i = 1, . . . ,p. 

From the form of the density it is obvious that whenever k is "large" compared to A, the 
density will have exactly one maximum (where the vector c{6) is approximately aligned with n) 
and exactly one minimum (where c{6) is approximately aligned with —k). This effect is studied 
in section [2] where we give a sufficient criterion for the distribution to be unimodal. Conversely, 
for small k the quadratic term {9)As{9) in the density ip dominates and one expects the 
occurrence of multimodal distributions. This situation is studied in section [3] where we show, by 
example, that a high number of modes is possible even in low dimensions. Finally, in section [4j 
we give an algorithm for generating samples of a MVM(/i, k. A) distribution for the unimodal 
case. This will be required as part of any algorithm to sample from a mixture model with 
MVM(/x, K, A) components. 



2 High Concentration 

In this section we derive a sufficient criterion for the MVM(//, /t. A) to be unimodal. Since the 
exponential function exp in the density ([I]) is strictly monotonically increasing and since the 
normalisation constant Z(k, A) does not depend on 6, it suffices to consider the extrema of 

fi9) = n'cie) + \s{9yKs{e) (2) 

instead. These can be found by setting the partial derivatives 

p 

dzf = -K^Si + Q ^ XikSk (3) 

equal 0: Since T^* is a compact, closed manifold, all local extrema of / are located a,t 9 £ 
with dif{9) = for i = 1, . . . ,p, ie. at critical points of /. 

To characterise the critical points of /, we consider the second derivatives 

p 

dijf = - (^KiC^ + 51 ^ikSkjSij + CiXijCj (4) 
fc=i 

where Sij denotes the Kronecker delta. If the Hessian matrix Hf{9) — {dijf{6))i,j at a critical 
point 9 is negative definite, is a local maximum of / and thus of ip{ - ; fi, n, A); if the Hf{9) is 
positive definite, is a local minimum; finally, if Hf{9) has both positive and negative eigenvalues, 
the point is a saddle point. 

For reference in the arguments below, we note that the biggest eigenvalue Amax of a symmetric 
matrix A ~ (aij) G W^'f satisfies 

Amax = sup Ax > max ej Aci = max an. 

x£TS.P,\x\ = l i=l,...,p 1=1,. ..,p 

In particular, if the Hessian matrix at a critical point 9 has a positive diagonal element, it has 
at least one positive eigenvalue and thus 9 cannot be a local maximum. Similarly, the smallest 
eigenvalue Amin satisfies Amin < niini=i_...^p a^i and if Hf{0) has a negative diagonal element, 9 
cannot be a local minimum. 
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Proposition 1. Assume that the matrix 



P = diag(Ki, . . . , Kp) - A 

is positive definite. Then the global maximum of (p — (p( - ; fi, A) is attained at 9 = ^ and (p 
has no other (local) maxima. 

Proof. For = fi we get V/(/i) = and Hf{fj,) = — P; by assumption, this matrix is negative 
definite and thus = fi is a local maximum. We now show that this is the only local, and thus 
the global, maximum of /. 

Since P is positive, the smallest eigenvalue A,nin of P satisfies < Amin < '[DiT^i=i,...,p Pa = 
minimi . ..^p Ki and thus we have Ki > for i = 1, . . . ,p. From equation ^ we see that dif{6) — 
implies ci ^ and consequently ^\ikSk = ^^iSijci. Substituting this into the expression for 
dijf in Q we find that the Hessian matrix Hf at a critical point has the elements 



If a critical point 6 has Ci{9) < for an index i g {1, . . . ,p}, then Hf{9)ii = —Ki/ci > and 
thus 9 cannot be a local maximum. Therefore we can assume Ci{9) > for i = 1, . . . ,p. 
Using the notation 

P,„ = aM-^.....-i^)-A (5) 

we can equivalently re-write the condition V f{9) = as 

P{9)s{9) = 0. (6) 

Since 

P{9) - P + diag(Ki(- - 1), ... , - 1)) 

is the sum of two positive matrices, it is positive and in particular non-singular. Thus, the only 
solution of (|6| is s = which implies that the maximum aX 9 = ^ \a the only critical point with 
Ci > for i = 1, . . . ,p. This completes the proof. ■ 

From the proof of proposition [T] we sec that, if 6* = /i is the global and thus a local maxi- 
mum of <y9, the matrix P must be positive semi-definite, i.e. the positivity condition is almost 
equivalent to having the global maximum at 0. The following corollary gives a sufficient (but 
not necessary) condition for the statement to hold; the given condition is often easier to verify 
in practice. Coincidentally, this stronger condition allows to also identify the minima of the 
von Mises density if. 

Corollary 2. Assume 

p 

Ki>'^\\ij\ for all i = I, . . . ,p. (7) 

Then the global maximum of ip = p{ - ; fi, k, A) is attained at 9 = pL, the global minimum is at 
9 = (/ii + TT, . . . , /ip + tt) and these two points are the only (local) extrema of ip. 



Proof. By the Gershgorin theorem (Horn and Johnson |1985[ Theorem 6.1.1), the eigenvalues of 



P are contained in the union of the closed discs B{Ki,ri) C C with radii = I ~ foi' 

i — 1, . . . ,p. Since P is symmetric, its eigenvalues are real and since we have Ki > = 
all eigenvalues of P are positive. Thus the condition of the proposition is satisfied and 6* = is 
the global maximum of p. 

Similarly, the eigenvalues of the matrix P{9) from ([5| are contained in the union of the closed 
discs B(Ki/ci, ri) C M with radii = X^j^j I ~ ^"^^ « = 1, . . . ,p. Since we have 

I — I ^ 'ti > ^ ^ I Aij I = , 

Ci 



3 



none of the discs contain and the matrix P{6) cannot have as an eigenvalue. This shows 
that aU solutions of (|6|, i.e. the critical points of /, satisfy s{d) — and thus c{6) e { — 1, 1}''- 

To classify the critical points, we consider the Hessian matrix Hf — {dijf{d))^_^. Invoking 
the Gershgorin theorem again, the eigenvalues of Hf are contained in the union of the closed 
discs with centres diif{6) and radii ^j-a \dijf{0)\ for i = 1, . . . ,p. Since we have 

\duf{0)\ = \l^^C^{e)\ = |«;,| >;^|A,,| -^|C,(0)A,,C,(0)| 

none of these discs contain and thus the circles corresponding to i with q = 1 and with Ci ~ ~1 
respectively form two disjoint groups. We can conclude that for each i with = 1 the matrix 
Hf has a negative eigenvalue and for each i with q = — 1 the Hessian has a positive eigenvalue. 
Consequently, = is the only local maximum of f, 9 = (fii + w, . . . , iip + tt) is the local 
minimum of / and all other critical points are saddle points. I 

It is easy to see, that the statements about the minimum in corollary [2] do not necessarily 
hold under the weaker assumption from proposition [T] For example, the matrix 

/ -2 2\ 
A -2 2 
\ 2 2 0/ 

has eigenvalues —4, 2 and 2. Thus, for k ~ (3, 3, 3) the matrix P is positive (the eigenvalues are 
1, 1 and 7), and the assumption of proposition [T] is satisfied. On the other hand, the Hessian 
matrix of / at = (/ii + tt, . . . , /ip + tt) is Hf(9) — diag(Ki, K2, k^) + A and, since this matrix 
is not positive semi-definite (the eigenvalues are —1,5 and 5), the minimum of the distribution 
cannot be at (vr, . . . , tt). 

3 Low Concentration 

In this section we consider the case of "small" k. In this case the structure of the extrema 
of a MVM(^, /c. A) distribution is much more complicated than for the concentrated case. We 
illustrate some of the possible scenarios with the help of examples, starting with the boundary 
case K = (0, 0, . . . , 0) and then considering small but non-zero k. 

The following lemma shows that for k = the case of a single global maximum can never 
occur. 

Lemma 3. For k — 0, the following statements hold: 

1. The density of the multivariate von Mises distribution MVM(/i,0,A) takes its maximal 
value on the set {^tt, |7r}P C T^, i.e. 

sup (/?(6'; /i, 0, A) = sup (p{9; fi,0, A). 

2. If 9 is a maximum, then so is {9i + t:, . . . ,9p + t:). In particular the number of isolated 
maxima of f is always even (and thus cannot be 1). 

Proof. Without loss of generality, we can assume /i = 0. Let 9* S T'' be a global maximum of 
ip{ - ;0,0, A). As in proposition [l] this is equivalent to 9* being a maximum of the function / 
from equation Since we assume k = 0, the formula for / simplifies to 

fi9) = lsi9VAsi9) (8) 
and the partial derivatives of / are given by 

p 

^^f{9)^c,{9)Y,>^^kSk{e). (9) 
fc=l 
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Let i e {1, 2, . . . ,p}. Since A.^^ — 0, the value J2k=i ^ikSkid) does not depend on 9i and thus 
&i ^ dif{9) can only change sign at the points 9i — ^tt, ^tt. Consequently, 9i i-> f{9) changes 
monotonically between the values 9i = ^tt, ^tt. Defining 9^ and 9~ by 9^ — ^tt, 6*,^^ = |7r, and 
6»+ = 6*^- = 6** for j ^ i this shows that one of the two inequalities f{9+) > f{9*) > f{9-) 
and f{9^) > f{9*) > f{9^) holds. Since 9* is a global maximum of /, equality holds in the 
upper bound and thus either 0"*" or 9~ is also a global maximum. By repeating this procedure 
for i = 1, 2, . . . ,p we find a global maximum where each coordinate is in the set TT, Itt}. This 
completes the proof of the first statement. 

The second statement is a direct consequence of the fact that the function / from ([s]) is 
invariant under the map 9^9 + (tt, . . . , tt). ■ 

Lemma 4. Let k — Q and A 7^ 0. Then every global maximum 9 of ip{9] fi,0, A) satisfies 

PWIU = 1. 

Proof. Since the trace of a matrix equals the sum of its eigenvalues and since A is a non-zero 
matrix with zero trace, A must have a strictly positive eigenvalue A. Let a; be a corresponding 
eigenvalue with ||a;||oo < 1- Then we can find 9 = {9i, . . . , 9p) with sm{9i) = Xi for i = 1, 2, . . . ,p. 
This vector 9 satisfies 

f{9) = \s{eYks{9) = \s{9ys{9) > 0. 

Consequently the maximal value of / is strictly positive. 

Now let 9 eTP with fi9) > and ||s(6')||oo < 1, «.e. \sii9)\ < 1 for all i e {1, . . . ,p}. Let 
c = l/||s(^')||oo > 1 and s — cs{9). Since ||s||oo = 1, we can find 9 E T'p with sm{9i) = Sj for 
i — 1,2, ... ,p. This point satisfies 

fi9) = ^s^As = c'^3{9)'^As{9) > f{9). 
Therefore, 9 cannot have been a maximum of /. ■ 




Example 1 (k = 0, two isolated modes). Consider a MVM(//, 0, A) distribution with 

A = 

Since k = 0, lemma |4] applies and shows that all maxima of (/3( • ; /x, 0, A) correspond to 9 where 
s{9) lies on the surface of the cube Q = [—1, 1]^. Thus, we can find the local extrema of / by 
first finding the local extrema of g{s) — ^s^As on the surface of Q and then identifying the 
corresponding values 9. To aid with finding the maxima of 5, figure [l] shows a plot of g on 
the (unwrapped) surface of Q. In the figure, the top-most square corresponds to S3 = -fl, the 
centre square to §2 = —1, the right-most square to Si — -1-1 and so on. One can see that the 
distribution has two modes, corresponding to s{9) = (—1, —1, —1) and s{9) = (+1, +1, +1). 

Example 2 (k = 0, one extended mode). Consider a MVM(/i,0, A) distribution with 

A 




We can find the modes of this distribution in the same way as we did in example 1, the corre- 
sponding plot of g is shown in figure [2] The figure shows that the density of this MVM(/i, 0, A) 
distribution has an extended maximum which forms a loop on the surface of the cube. Figure [2] 
shows the density as a function of s{9). To given an idea of the distribution of the corresponding 
angles 01,^2,^3 themselves, we show a scatter plot of a sample in figure |3] While this (much 
more conventional) diagram shows the distribution of the sample clearly, comparison with fig- 
ure [2] makes it clear that the structure of the mode is difficult to understand from a scatter plot 
alone. 

The case of small, non-zero k can be seen as a perturbation of the case k = 0. Such a 
perturbation would normally just shift the extrema of the density around, but the following 
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Figure 1: Visualisation of a von Mises density from example 1, as a function of s{6) restricted 
to the surface of the cube [0, 1]^. The plot shows that the distribution has two modes. 
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7 




Figure 3: Scatter plot of 1000 samples from the distribution from example 2. To m,ake the 
structure of the maximum more visible, the matrix K was multiplied by 10, i.e. the plotted sample 
is from a MVM(0, 0, 10 A) distribution. The regions where the scatter plots have higher intensity 
are not isolated modes of the distribution but are artefacts caused by the projection of onto 
where straight segments of the extended maximum are seen "head-on". 
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example shows that such a perturbation can also break a spatially extended maximum into a 
set of isolated maxima, thus increasing the number of modes. 

Example 3 (k > 0, six isolated modes). The maximum of the von Mises distribution illustrated 
in figure [2] lives on a "ring" formed as the union of six lines in T"^, aligned with the grid {^tt, |7r}'^. 
Since the Ci are zero on the grid and take their maxima between the grid points, we would expect 
that adding a perturbation term c{9) with small k will not only shift these lines, but will also 
collapse this extended maximum into a collection of isolated maxima which live on the shifted 
lines, at the point where the perturbation was maximal. The following, explicit example gives a 
von Mises distribution in with six isolated maxima. 
Let ?7 > and e = sin(?7). Define 
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We will show that, for small enough rj, the function f{9) = kJ c{9) + }^s{9)^ Ks{9) has local 
maxima at the six points 9^ ^ ... ,9^ e T'^ given by the following table. 
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For the convenience of the reader, the table also gives the vectors s{9^) and c{9^) iox I — 1, . . . , 6. 
By substituting these values into the formula for dif from equation ([s]), it is easy to check that 
= for ^ = 1, . . . , 6 and thus all six points are critical points of /. 
Substituting the values from the table into the formulas for dijf from Q, we can compute 
the value of the Hessian matrix Hi = Hf{9^) for £ = 1, . . . , 6. The results are as follows: 




It can be checked that each of these matrices has eigenvalues Ai = — 1 + C'(£2), A2 = — 1 + C'(£2), 
and A3 = — £ + C'(£2). Thus, for small enough £ > 0, all six points are local maxima as required. 



4 Sampling 

In this section we discuss a simple method to generate samples from a MVM(/i, k. A) distribution. 



using the rejection sampling algorithm (Robert and Casella 2004 Corollary 2.17). The method 



is restricted to small or moderate p, but works well for the case of high concentration. We 
assume that the matrix 

P = diag(Ki, . . . , Kp) - A 

is positive definite. 

Without loss of generality we can assume fi = 0, the general case is then obtained by a 
simple shift. We denote the smallest eigenvalue of P by Amin > 0. The proposed algorithm uses 
independent angles 9i,92, ■ . ■ ,9p as proposals, distributed with density 

fr exp(Vcos(2g)) 
27r/o(^) 

This is the independent product of one-dimensional von Mises distributions, modified by re- 
placing the angle 9 by 29. Since we can efficiently generate samples 9i from a one-dimensional 
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von Mises distribution VM(0, Amin/4) {e.g. Best and Fisher 1979[ ), we can obtain samples from 
the density g by taking 9i — 9/2 with probabihty 1/2 and 9i = 9/2 + tt else. 

The target density is the density of the multivariate von Mises distribution MVM(0, k, A), 
i.e. it is proportional to 

f{9)^exp(^K^c{9) + ^s{9)'^As{9)) 

with Ci{9) — cos{9i) and Si{9) — sin(6'i) for i — l,...,p. Using the inequalities cos{9) + 
sin(6i)V2 < 1 and sl9yPs{9) > X^ins{9y s{9), we find 



f{9) = cxp(^K^c{9) + ^s{9)'^As{9) 



^ \ ■ 

^inin 



1=1 

Finally, since cos(2a;) = 1 — 2sin(x)^, we can rewrite this expression as 

p 



i=l i=l 



i=l 



Cg{9). 



Thus we have found a constant C with / < Cg and the rejection sampling algorithm can be 
applied. 

In the rejection sampling algorithm, a proposal 9 is accepted with probability f{9)/Cg{9), 
i.e. with probability 

expfK^c(6i) + ^s{9yAs{9) 
p{9) - ^ 



exp(j:P^,K,-^s{9ys{9) 

CXp(^E K^{ci - 1) + 2*^(^ + ^min-^)s 



where / is the p x p identity matrix. Thus, the following algorithm can be used to generate 
samples of a MVM(/i, k, A) distribution when P is positive: 

1. Generate random variables 

VM(0,A,„i„/4) 
5i,...,6n with P{5i = 0) = P{6, = tt) = 1/2 

all independent of each other. 

2. Let 9i ^ 9,/2 + 6, for i = l,2,...,p. 

3. Let Si — sm{9i) and q = cos(0i) for i = 1, 2, . . . ,p. 

4. If the condition 

p 1 
U < exp(E K,{c, - l) + -s^(A + Ami„/)s) 
1=1 

is satisfied, output 9 = + /ii, 6*2 + /i2, ■ . . ,9p + jip) [i.e. the proposal is accepted). 
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5. Return to step [T] 



We note that the algorithm stiU works when the eigenvahie Amin is replaced by a lower bound 
< Amin < Amin for the eigenvalues of P. This allows to apply the algorithm in situations where 
the eigenvalues of P are not exactly known. 

The efficiency of this algorithm is determined by its acceptance rate: If Z is the normalisation 
constant which makes -^f a probability density, then each proposal is accepted with probability 



Z/C. From Mardia et al. (2011 equation (3)) wc know that, for high concentration, we have 



Z« (27r)P/2|F| 



-1/2 



p 



exp(^2^K, 

i=l 



where \P\ is the determinant of the matrix P. From Abramowitz and Stegun ( 1964 formula 9.7.1) 
we know 

V27me~''/o(K) — > 1 

as K — oo. Consequently, the asymptotic acceptance probability for high concentration is 




i2nr/'\P\-'/'e^p{J:tl^^) 

/4 + ELi «0 ■ (2^)f/2(4/A,„in)f/2 exp(pA„,i„/4) 



(10) 



The proposed algorithm will be efficient if this probability is not to small. Considering the first 



factor on the right-hand side of (10 1, we see that the method only can be expected to perform 



well for sufficiently small values of p. The factor 1/2^ is expected, since the proposal distribution 
has 2^ modes, whereas the target distribution has only one. Since the determinant |P| equals 
the product of all p eigenvalues of p (the smallest of which is Amin), the second factor on the 
right-hand side of (10 1 is big, if the eigenvalues of P are all of the same magnitude, i.e. if the 



mode of the distribution is approximately rotationally symmetric. 
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